Day 21 & 22

Author

Maya McCain

Day 21

library(dataRetrieval)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom        1.0.7     ✔ rsample      1.2.1
✔ dials        1.4.0     ✔ tune         1.3.0
✔ infer        1.0.7     ✔ workflows    1.2.0
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.3.0     ✔ yardstick    1.3.2
✔ recipes      1.1.1     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
library(zoo)

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(lubridate)
library(tsibble)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

Attaching package: 'tsibble'

The following object is masked from 'package:zoo':

    index

The following object is masked from 'package:lubridate':

    interval

The following objects are masked from 'package:base':

    intersect, setdiff, union
library(feasts)
Loading required package: fabletools

Attaching package: 'fabletools'

The following object is masked from 'package:yardstick':

    accuracy

The following object is masked from 'package:parsnip':

    null_model

The following objects are masked from 'package:infer':

    generate, hypothesize
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
# Example: Cache la Poudre River at Mouth (USGS site 06752260)
poudre_flow <- readNWISdv(siteNumber = "06752260",    # Download data from USGS for site 06752260
                          parameterCd = "00060",      # Parameter code 00060 = discharge in cfs)
                          startDate = "2013-01-01",   # Set the start date
                          endDate = "2023-12-31") |>  # Set the end date
  renameNWISColumns() |>                              # Rename columns to standard names (e.g., "Flow", "Date")
  mutate(Date = yearmonth(Date)) |>                   # Convert daily Date values into a year-month format (e.g., "2023 Jan")
  group_by(Date) |>                                   # Group the data by the new monthly Date
  summarise(Flow = mean(Flow))                       # Calculate the average daily flow for each month
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2013-01-01&endDT=2023-12-31

Convert to a tsibble

poud_tbl <- as_tsibble(poudre_flow)
Using `Date` as index variable.
head(poud_tbl)
# A tsibble: 6 x 2 [1M]
      Date   Flow
     <mth>  <dbl>
1 2013 Jan  18.1 
2 2013 Feb  18.0 
3 2013 Mar   8.21
4 2013 Apr   5.94
5 2013 May 333.  
6 2013 Jun 300.  

Plotting the time series

flow_plot <- ggplot(poudre_flow, aes(x = Date, y = Flow)) +
  geom_line(color = "darkblue") +
  labs(title = "Cache la Poudre River Streamflow",
       x = "Date", y = "Flow (cfs)") +
  theme_minimal()
ggplotly(flow_plot)

Subseries

gg_subseries(poud_tbl) +
  labs(title = "Monthly Poudre River Streamflow Patterns", y = "Flow", x = "Year") + 
  theme_minimal()
Plot variable not specified, automatically selected `y = Flow`

I notice that streamflow is relatively low October through March consistenetly throughout the 10 year period. May and June have significantly higher streamflows than the rest of the months while April, July, August, and September have slightly higher flows than October through March. I also see that there was abnormally high streamflows in September of 2014 which may indicate a flood. Seasons are not specifically defined in this plot, however streamflow is divided up by month. We can assume that there is a season of low flows (September-March) and a season of high flows (April-August). The subseries represents how streamflows during January behaves across the ten year period. This allows us to compare how the months behave over the years.

Decompose

poud_decomp <- stl(poud_tbl, s.window = "periodic") |>
  plot()

The seasonal component is most appropriate for Poudre River streamflow data. The plot shows various patterns in flows over a ten year period. The seasonal window depicts streamflow patterns throughout the months of the year, most likely related to snowmelt patterns. Seasonal streamflow has stayed relatively the same in the last ten years. The trend window depicts streamflow levels over the ten year period, most likely related to long term influences such as climate change. Trend streamflows have decreased in the last ten years.

Day 22

library(modeltime)
library(tidymodels)
library(timetk)

Modeltime 12 Month Prediction

poud_pred_tbl <-  tsibble::as_tsibble(poudre_flow) |> 
  as_tibble() |>
  mutate(date = as.Date(Date), Date = NULL) 
Using `Date` as index variable.
splits <- time_series_split(poud_pred_tbl, assess = "12 months", cumulative = TRUE)
Using date_var: date
poud_training <-  training(splits)
poud_testing  <-  testing(splits)

Prophet Model

proph_mod <- arima_reg() |>  set_engine("auto_arima")
arima_mod <- prophet_reg() |> set_engine("prophet")

mods <- list(
  fit(proph_mod, Flow ~ date, data = poud_training),
  fit(arima_mod, Flow ~ date, data = poud_training)
)
frequency = 12 observations per 1 year
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- as_modeltime_table(mods)
print(models_tbl)
# Modeltime Table
# A tibble: 2 × 3
  .model_id .model   .model_desc            
      <int> <list>   <chr>                  
1         1 <fit[+]> ARIMA(0,0,2)(0,1,2)[12]
2         2 <fit[+]> PROPHET                
calibration_table <- models_tbl |> 
  modeltime_calibrate(new_data = poud_testing)
print(calibration_table)
# Modeltime Table
# A tibble: 2 × 5
  .model_id .model   .model_desc             .type .calibration_data
      <int> <list>   <chr>                   <chr> <list>           
1         1 <fit[+]> ARIMA(0,0,2)(0,1,2)[12] Test  <tibble [12 × 4]>
2         2 <fit[+]> PROPHET                 Test  <tibble [12 × 4]>
modeltime_accuracy(calibration_table) |> 
  arrange(mae)
# A tibble: 2 × 9
  .model_id .model_desc             .type   mae  mape  mase smape  rmse   rsq
      <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1         1 ARIMA(0,0,2)(0,1,2)[12] Test   105.  86.2 0.380  50.6  246. 0.965
2         2 PROPHET                 Test   161. 381.  0.582 161.   228. 0.855
(forecast <- calibration_table |> 
  modeltime_forecast(h = "12 months", 
                     new_data = poud_testing,
                     actual_data = poud_tbl) )
Error: No date or date-time identified.
Warning: Unknown or uninitialised column: `.key`.
# Forecast Results
  
Conf Method: conformal_default | Conf Interval: 0.95 | Conf By ID: FALSE
(GLOBAL CONFIDENCE)
# A tibble: 12 × 7
   .model_id .model_desc .key       .index       .value .conf_lo .conf_hi
       <int> <chr>       <fct>      <date>        <dbl>    <dbl>    <dbl>
 1         2 PROPHET     prediction 2024-01-01 -127.       -584.     329.
 2         2 PROPHET     prediction 2024-02-01 -132.       -588.     324.
 3         2 PROPHET     prediction 2024-03-01 -121.       -578.     335.
 4         2 PROPHET     prediction 2024-04-01  -67.1      -523.     389.
 5         2 PROPHET     prediction 2024-05-01  706.        249.    1162.
 6         2 PROPHET     prediction 2024-06-01  770.        314.    1227.
 7         2 PROPHET     prediction 2024-07-01   -0.832    -457.     456.
 8         2 PROPHET     prediction 2024-08-01  -58.1      -514.     398.
 9         2 PROPHET     prediction 2024-09-01   -7.89     -464.     448.
10         2 PROPHET     prediction 2024-10-01 -101.       -557.     356.
11         2 PROPHET     prediction 2024-11-01 -135.       -591.     322.
12         2 PROPHET     prediction 2024-12-01 -128.       -584.     329.
plot_modeltime_forecast(forecast)
refit_tbl <- calibration_table |>
    modeltime_refit(data = poud_pred_tbl)
frequency = 12 observations per 1 year
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
refit_tbl |>
    modeltime_forecast(h = "12 months", actual_data = poud_tbl) |>
    plot_modeltime_forecast()
Error: No date or date-time identified.
Warning: Unknown or uninitialised column: `.key`.

Data Retrieval

# Example: Cache la Poudre River at Mouth (USGS site 06752260)
real_poudre_flow <- readNWISdv(siteNumber = "06752260",    # Download data from USGS for site 06752260
                          parameterCd = "00060",      # Parameter code 00060 = discharge in cfs)
                          startDate = "2024-01-01",   # Set the start date
                          endDate = "2024-12-31") |>  # Set the end date
  renameNWISColumns() |>                              # Rename columns to standard names (e.g., "Flow", "Date")
  mutate(month = yearmonth(Date)) |>                   # Convert daily Date values into a year-month format (e.g., "2023 Jan")
  group_by(month) |>                                   # Group the data by the new monthly Date
  summarise(real_flow = mean(Flow))                       # Calculate the average daily flow for each month
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2024-01-01&endDT=2024-12-31
preds <- forecast %>%
  filter(.key == "prediction") %>%
  select(month = .index, predicted = .value)

comparison <- left_join(preds, real_poudre_flow, by = "month") %>%
  drop_na()

R-Squared

rsq_val <- lm(real_flow ~ predicted, data = comparison) %>%
  glance() %>%
  pull(r.squared)
print(rsq_val)
[1] 0.8249846

The R-Squared value between predicted and observed flows is 0.82. The model explains 82% of the variance of the real streamflow in 2024.

Predicted vs. Observed Plot

ggplot(comparison, aes(x = real_flow, y = predicted)) +
  geom_point(size = 3, color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray40") +
  labs(
    title = "Predicted vs Real Monthly Streamflow in 2024",
    x = "Real Flow (cfs)",
    y = "Predicted Flow (cfs)") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'